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. . , We propose a generic model for multiple choice situations in the presence of herding and compare 

■ it with recent empirical results from a Web-based music market experiment. The model predicts 
' a phase transition between a weak imitation phase and a strong imitation, 'fashion' phase, where 

choices are driven by peer pressure and the ranking of individual preferences is strongly distorted 
at the aggregate level. The model can be calibrated to reproduce the main experimental results of 
(-H ' Salganik et al. {Science, 311, pp. 854-856 (2006)); we show in particular that the value of the social 

I influence parameter can be estimated from the data. In one of the experimental situation, this value 

h-^ , is found to be close to the critical value of the model. 

^ : 

(N . 

I. INTRODUCTION 

^ : 

• Making decisions is part of everyday life. Some situations require a binary choice (i.e. to vote yes or no in a 
] referendum, to buy or not to buy a cell phone, to join or not to join a riot, etc. Many others involve multiple 

■ options, for example in the first round of French presidential elections (where the number of candidates is typically 15), 
^ ] in portfolio management where very many stocks are eligible, in supermarkets where the number of possible products 
^ . to buy is large, etc. In most cases, the choice is constrained by some generalized budget constraint, either strictly 

■ ^ ' (at most one candidate in the French presidential election) or softly (the total spending in a supermarket should on 
^»-^,, average be smaller than some amount). It is common experience that people generally do not determine their action 
f~i ■ in isolation. Quite on the contrary, interactions and herding effects often strongly distort individual preferences, and 
are clearly responsible for the appearance of trends, fashions and bubbles that would be difficult to understand if 
agents were insensitive to the behaviour of their peers. Catastrophic events (such as crashes, or sudden opinion shifts) 
^-H ' can occur at the macro level, induced by imitation, whereas the aggregate behaviour of independent agents would be 
J> . perfectly smooth. 

■ A relevant challenge in the present era of information economy is to be able to extract faithfully individual opin- 
, ions/tastes from the publicly expressed preferences under the influence of the crowd. For example, book reviewers 
' on Amazon may be biased by the opinion expressed by previous reviews; if imitation effects are too strong, over- 
] whelmingly positive (or negative) reviews cannot be trusted (see 0), as a result of "information cascades" Q. In 

\^ • the case of financial markets, strong herding effects in the earning forecasts of financial analysts have been reported 
' - the dispersion of these forecasts is typically ten time smaller than the ex post difference between the forecast and 
the actual earning (see Q and refs. therein). These herding effects may lead to a complete divergence between the 
^ ' market price and any putative 'rational' price. In the context of scientific publications, the substitution of the present 
' ^ , refereeing process by other assessment tools, such as number of downloads from a preprint web-page, or number of 
J>^' citations, is also prone to strong, winner-takes-all, distortions More generally, it is plausible that such herding 

phenomena play a role in the appearance of Pareto-tails in the measure of success (wealth, income, book sales, movie 
Qh' attendance, etc.). 

^ \ Despite their importance, already stressed long-ago by Keynes and more recently by Schelling 0, quantitative 
models of herdin g an d interaction effects have only been explored, in different contexts, in a recent past, see 
1^; HOi nil nil nil nil ' This category of models have in fact a long history in physics, where interaction is indeed 
at the root of genuinely collective effects in condensed matter, such as ferromagnetism, superconductivity, etc. One 
particular model, that appears to be particularly interesting and generic, is the so-called 'Random Field Ising Model' 
(rfim) ^3 , which models the dynamics of magnets under the influence of a slowly evolving external solicitation. This 
model can be transposed in a socio-economics context ll^ llSl Il9| to represent a binary decision situation under 
social pressure. A robust feature of the model is that discontinuities appear in aggregate quantities when imitation 
effects exceed a certain threshold, even if the external solicitation varies smoothly with time. Below this threshold, the 
behaviour of demand, or of the average opinion, is smooth, but the natural trends can be substantially amplified by 
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peer pressure. The predictions of the RFIM can be confronted, with some success, to emp irical observations concerning 
sales of cell phones, birth rates and the terminal phase of clapping in concert halls [l9l |. 

Here, we want to generalize the RFIM to multiple choice situations. One motivation is that, as mentioned above, 
these situations are extremely common. A more precise incentive for such a generalization is however the recent 
publication of a remarkable experimental paper by Salganik, Dodds and Watts 20]. In order to detect and quantify 
social influence effects, the authors have conducted a careful Web-based experiment (described below) with several 
quantitative results. Their detailed interpretation begs for a specific model, which we introduce and discuss in this 
paper and compare with these empirical results. The model is found to fare quite well and allows one to extract 
from the data a quantitative estimate of the imitation strength, called J below. Interestingly, one of the situations 
corresponds to a value of J close to the critical point of the model, where collective effects become dominant and 
strongly distort individual preferences. 



II. THE MODEL 



We consider N agents indexed by roman labels i = 1, ...,iV, and M items indexed by Greek labels a — 1, ...,M. 
Each agent can construct his 'shopping list' or portfolio of items, for simplicity, we restrict here to cases where the 
quantity of item a is either zero or unity (in the example of movies, we neglect the possibility of going twice to see the 
same movie). The portfolio of agent i is therefore a vector of size M: {nf} with nf = 0, 1. The "budget constraint" 
can in general be written as: 

M 



B^<J2nt<Bt, (1) 



where the budget might be different for different agents. 

The choices made by agent i are assumed to be determined by three different factors: 

• a piece of public information affecting all agents equally, measuring the intrinsic attractivity of item a. This is 
modeled by a real variable F"', which may contain, for example, the price of the product (low price means large 
F"'s), or its technological performances, past reputation, etc. 

• an idiosyncratic part describing the preferences/tastes of agent i, in the absence of any social pressure or 
imitation effects. This part is again modeled by a real variable hf, which is positive and large if agent i is 
particularly fond of item a. 

• a social pressure/imitation term which describes how the choices made by others affect the perception of item 
a by agent i. In full generality, we can write this term as: 

where Jjf measures the influence of the consumption of product /3 by agent j. Positive jfj^'s describe herding- 
like effects (which could exist across different products), whereas negative Jjf^fi are related to contrarian effects 
(for example, agent j buying item (3 might push the price of item a up). We will consider in this paper a 
simplified version of the model where only the aggregate consumption of item a itself infiuences the value of nf , 
i.e.: 



Jj,f = (3) 

where the factor Af is introduced for convenience and C is the total consumption, defined as: 

i a 

We will also introduce the total consumption of item a as C" = nf, and the relative consumption (or success rate) 

r =C"/C, with Ea'/'" = 1- 

We assume that the consumption of item a by agent i is effective if the sum of these three determining factors 
exceed a certain ttireshold, and consider the following update rule for the n?'s:|27j 



nf{t + l) = e 



F'^ + hf + JM[rit)~j^]-hit) 



(5) 



3 



where Q is the Heaviside function, Q{u > 0) = 1 and 8(m < 0) = 0. In the above equation, we have added a 'chemical 
potential' bi (borrowing from the statistical physics jargon) which allows the budget constraint to be satisfied at all 
times |2l|- The — 1/M term was added for convenience, and makes explicit that it is the consumption of item a in 
comparison with its expected average 1/M that generates a signal (see also p^l*). It is easy to check that the case 
M — 1, with J JC/N, corresponds to the standard rfim considered in Note also that the O function describes 
a deterministic situation: as soon as the total 'utility' of item a is positive for agent i, consumption is effective. One 
could choose a probabilistic situation where Q{u) is replaced by a smoothed step function, for example: 



The limit /3 — > cx) corresponds to the deterministic rule, to which we will restrict throughout this paper. 

In the following, we assume that both F^s and h's are time independent, and taken from some statistical distributions 
which we have to specify. Here again, the number of possibilities is very large, and correspond to different situations. 
We choose the F'^'s as IID random variables (for example Gaussian), with mean mp and variance The mean mp 
describes the average intrinsic attractivity of items - for example, a large overall inflation would lead to a negative 
mp- The dispersion in quality of the different items is captured by T,p. More realistic models should include some 
sort of 'sectorial' correlations between the F"'s. 

As for /if s, we posit that they can be decomposed as hf — hi + Shf, where hi describes the propensity of agent i 
for consumption ('compulsive buyers' correspond to large positive /li's), whereas Shf correspond to the idiosyncratic 
tastes of agent j, defined to have zero mean. For simplicity, we again assume that both /i^'s and Shf are IID; without 
loss of generality we can assume that the average (over i) of hi is zero (a non zero value could be reabsorbed into mp). 
The variance of hi is and that of Shf is a^. Since in the limit (3 ^ oo considered in this paper the overall scale of the 
fields is irrelevant, we can choose to set a = 1. One could also add explicit time dependence, for example choosing mp 
to be an increasing function of time, to describe a situation where the average propensity for consumption increases 
with time. 

The model as defined above is extremely rich and its detailed investigation as a function of the different parameters 
and budget constraints will be reported in a forthcoming publication. The most interesting question about such a 
model is to know whether the realized consumption is faithful, i.e. whether or not the actual choice of the different items 
reflects the 'true' preferences of individual agents, as would be the case in the absence of interactions {J — 0). Based 
on the RFIM, we expect that this will not be the case when J is sufficiently large, in which case strong distortions 
will occur, meaning that the realized consumption will (i) violate the natural ordering of individual preferences 
and (ii) become history dependent: a particular initial condition determines the 'winners' in an irreproducible and 
unpredictable way. In order to characterize the inhomogeneity of choices, the authors of pol | have proposed and 
measured different observables, in particular: 

• The Gini coefficient G, defined as: 



which is zero if all items are equally chosen, and equal to 1 — 1/M if a unique item is chosen. The Gini coefficient 
is a classic measure of inequality. In fact, a more relevant measure of interaction effects is the ratio G/Gq, where 
Go is the Gini coefficient for J = 0. 

• The unpredictability coefficient U, defined as: 



where the indices k,i refer to W different 'worlds', i.e. different realizations of the model with the very same 
f "'s but a different set hf's (chosen with the same distribution) or different initial conditions. In the limit of a 
large population {N — > oo), it is easy to show that U = when J = 0, since the 0" only depends on the F°"s. 
A non zero value of U, on the other hand, reveals that it impossible to infer from the intrinsic quality of the 
items the aggregate consumption profile (strong distortion). 

• A more detailed information is provided by the scatter plot of 0" versus 0"(J = 0); for J small one expects 
a nearly linear relation, whereas for larger J the points acquire a larger dispersion and the average relation 
becomes non-linear, indicating a substantial 'exaggeration' of the consumption of slightly better items. 




(7) 




M W 



(8) 
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We have studied these quantities both numerically and analytically within the above model. We present below 
some of our numerical results, and compare them with the empirical results of |2Clj| . Our most important analytical 
result is the existence of a critical value Jc, below which the unpredictability U is strictly zero in the limit N oo, 
and becomes positive for J > Jc, growing asU ^ J — Jc close to the transition. The fluctuations of U diverge close 
to Jc, as for standard second order phase transitions. The value of Jc can be computed exactly in the limit of a large 
number of items Af » 1, and depends on the detailed shape of the distribution of the fields F and h. More precisely, 
Jc is given by: 



Jc - / dF PF{F)-i{F), (9) 

J — OO 

where 7(F) is the solution of: 

and Pp and Ph are the distributions of the fields F and h. 



III. THE WEB-BASED EXPERIMENT OF SALGANIK ET AL. 

Here we describe the beautiful experimental set-up of M. J. Salganik, P. S. Dodds and D. J. Watts |23], which 
allows them to conclude that social influence has a determinant effect on the choices of individual agents. In the next 
section, we will in fact use their quantitative results to measure, within the above theoretical framework, the strength 
of the social influence factor J. Salganik et al. have |23| created an artificial "music market" on the web with M = 48 
songs from essentially unknown bands in which 14,341 (mostly teen-agers) participated. Songs are presented in a 
screen and participants make decisions about which songs to listen to, and in a second step, whether they want to 
download the song they listened to. Participants are randomly assigned to one of the three following situations: 

• an independent (zero-influence) situation where the list of songs carries no mention of the songs downloaded by 
other participants. This situation allows to define a benchmark, where an 'intrinsic' mix between the quality of 
the songs and the preference of the participants can be measured. This situation corresponds to J = in the 
model above; 

• a 'weak' social influence situation. In this case, the number of times a given song has been downloaded by other 
participants is shown. However, the songs are presented in random order so that the ranking of the preference 
of other participants is not obvious at first glance. This situation corresponds to a certain small value Ji > in 
the model above; 

• a 'strong' social influence situation. In this case, the list of songs is presented by decreasing number of downloads, 
such as to emphasize the preferences expressed by previous participants. This situation corresponds to a certain 
value J2 > Ji > in the model above. 

Furthermore, in both social influence conditions participants are randomly assigned ioW — ^ different worlds, each 
one with its own history and evolving independently from one another, but with the same initial conditions, i.e. zero 
downloads. For each of the two influence conditions, the outcomes (i.e. the number of downloads of all songs) are 
compared to the independent, zero-influence situation. In this way, the authors are able to conclude that increasing 
the strength of social influence increases both the inequality G and the unpredictability of success U [20j |. 

Because these experiments look very much like those in physical laboratories, we believe that they could play 
an important role in the development of scientific investigations of collective human behavior. The Web gives the 
opportunity to devise and perform large scale experimentation (see also p^*). with a number of participants that 
allows one to extract meaningful statistical information. We expect that many other experiments of the same type 
will be conducted in the future. In the present case, the experiment is very carefully thought through to remove 
many artefacts: for example, download is free (no consideration of the wealth of participants is required - no 'budget 
constraint') and anonymous (no direct social pressure is involved); participants are not rewarded to have made a 
'good' or 'useful' choice, songs and bands are not well known (avoiding strong a priori biases), etc. 



IV. MODEL CALIBRATION: TOWARDS A MEASUREMENT OF SOCIAL INFLUENCE? 

We now turn to a semi-quantitative analysis of the empirical data collected by Salganik et al. [23|. Once the 
distribution of i^"'s and /if s are fixed (we chose them to be Gaussian for simplicity), the model depends on four 



FIG. 1; Gini coefficient as a function of J for tlie clioice mp ~ —2, Ef ~ 0.2, E = 
N = 700, 7000 and 70000. Note the ratlier weak dependence on iV of tliis quantity, 
different situations are: Go ~ 0.22 (no imitation), Gi ~ 0.35 (weak imitation) and G2 



- 1, and for different number of agents 
Tlie empirical values of G in the three 
~ 0.5 (strong imitation). 



• N ^ 700 

□ N ^ 7000 
o N ^ 70000 



FIG. 2: Unpredictability f/ as a function of J for the choice mp ~ —2, Ef ~ 0.2, E = 1, and for different number of agents 
A*' = 700, 7000 and 70000. In this case, the finite size effects are strong; one in fact expects U to be zero for J < Jc ~ 0.29 
(dashed vertical line), and to grow linearly for small J — Jc > 0. The empirical values of U for A'' = 700 and in the three 
different situations are: Uo « 0.0045 (no imitation), Ui ~ 0.008 (weak imitation) and G2 ~ 0.013 (strong imitation). This last 
case corresponds, for A'^ = 700, to J2 ~ Jc- 



parameters: mF,Si?,E and the social influence J. These values must be chosen as to reproduce the observations 
reported in [23, namely: 

• The Gini coefficient Go, the unpredictability Uq and the qualitative shape of the distribution of <^q in the 
independent situation, corresponding to J = 0. 

• The Gini coefficient G, the unpredictability U and the qualitative shape of the relation between (jf and (f)^ in 
the social influence conditions 

Quite a lot more data is reported in the supplementary material of poj. for example the average number of 
downloaded songs per participant d ~ C/N . In fact, the situation of |23| is slightly more complicated than assumed 
in the above model because each participant makes a two-step decision. Participants, before possibly downloading a 
song, first choose to listen to it. These two decisions may be correlated and both influenced by the choice of other 
participants. The authors of |23| report separate statistics for the number of downloaded songs and the number of 
'tested' songs. In order to reproduce these results in full detail, one must generalize the above model, for example by 
assuming that the number of downloads of song a by agent i can be written as: 



"{t + i) - ^-re 



F" + /i, + + JM ( 0"(t) - 



(11) 



FIG. 3: Scatter plot of the realized preferences 4'°' (J) as a function of the 'intrinsic' preferences (po , in the weak social influence 
condition (Ji = 0.17, left), and in the strong social influence condition (J2 = 0.30, right), all for mp ~ —2, Ef ~ 0.2, E = 1. 
Lines are linear regressions. These plots compare well with the corresponding plots of |2C1| (Figs. 3-A and 3-C; we use here the 
same scale as in /Cj| ). 



where "^f = 1 with probability and otherwise describing the decision of actually downloading a song after 
listening to it. Although the inclusion of this second decision step is crucial to account fully for the results of [2U, 
we neglect this aspect altogether in the present paper and refer the reader to a later, more detailed publication [2l|. 
Here we want to show that the main empirical features can indeed be reproduced by the model. 

Different choices of mi?,Si7,S are in fact compatible with the observations corresponding to J = 0, for which 
Salganik et al. find Go ~ 0.22 and Uq « 0.0045 (for a number of participants in each 'world' of = 700, the value 
we also use in our numerical simulations). A possible choice (further justified in 21]) is: mp « —2, Sj? ss 0.2, 
S = 1. The resulting shape of the distribution of 0q is found to be compatible with the data of [23|- Note that 
= 0.04 < +(T^ = 2, suggesting that the intrinsic quality of songs is less dispersed than the preference of agents. 
This is expected in a situation where songs and bands are unknown, leading to very small a priori information on 
their intrinsic quality. 

Now, it is interesting to see how G and U are affected by a non zero value of J - cf. Figs. 1 and 2. ^From these 
plots, one sees that the 'weak' social influence situation, characterized by Gi ~ 0.35 and Ui « 0.008 l2(|. corresponds 
to Ji ss 0.17. One the other hand, the 'strong' influence situation yields G2 ~ 0.5 and U2 ~ 0.013 [2C|, which we 
can account for by setting J2 « 0.30. The scatter plots of 0" vs. 0q are shown in Figs 3-a and 3-b and can be 
satisfactorily compared to Figs. 3-A and 3-C of . 

It is of particular interest to compare the above values of Ji and J2 to the critical value Jc of the model, which can 
be determined exactly as a function of TOi?,Ei?,E in the limit M — > 00 21]. In the present case, we find Jc « 0.29, 
such that, in the limit A^ — » 00, U{J < Jc) should be strictly zero. As expected on general grounds and shown in 
Fig. 2, the value of U at finite N suffers from large finite size effects. Only a careful extrapolation for N ^ 00 allows 
one to confirm the existence of a critical value Jc 21] . But in any case, the value J2 accounting for the data in the 
'strong' infiuence situation is indeed quite large, since it corresponds to the critical region where imitation effects 
become dominant. 

Another effect worth noticing is the dependence of the average number of downloaded songs d (or consumption 
C = Nd) on the imitation parameter J, predicted by the model and reported in Fig 4. We see that this quantity has 
a clear maximum as a function of J: at first, imitation effects tend to increase the total consumption until J ^ 1, 
beyond which over-polarisation on a small number of items become such that the total consumption goes back down. 
This might have interesting consequences for marketing policies, for example (see e.g. The increase of the 

d with J is actually not observed in [23; see for a further discussion of this point. 



V. CONCLUSIONS 



We have proposed a generic model for multiple choice situations with imitation effects and compared it with recent 
empirical results from a Web-based cultural market experiment. Our model predicts to a phase transition between a 
weak imitation phase, in which expressed individual preferences are close to their value in the absence of any direct 
social pressure, and a strong imitation, 'fashion' phase, where choices are driven by peer pressure and the ranking of 
individual preferences is strongly distorted at the aggregate level. The model can be calibrated to reproduce the main 
experimental results of Salganik et al. |20| ; we show in particular that the value of the social inffuence parameter can 
be estimated from the data. In one of the experimental situation, this value is found to be close to the critical value 
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• • N = 700 

□ □ N^7000 
o oN^VOOOO 



FIG. 4: Average number of downloaded songs d (or consumption C — Nd) as a function of J for the choice mp ~ —2, Ef ~ 0.2, 
E = 1, and for different number of agents TV — 700, 7000 and 70000. Finite size effects are quite small in this case. Note the 
clear maximum of this quantity as a function of the imitation strength J. 



of the model, confirming quantitatively that social pressure are strong in that case. This concurs with the conclusions 
of Tg^, who also found near critical values of the social influence parameter. 

Our model can be transposed to many interesting situations, for example that of industrial production, for which 
one expects a transition between an archaic economy dominated by very few products and a fully diversified economy 
as the dispersion of individual needs becomes larger. We leave the investigation of these questions, and the detailed 
analytical investigation of our model, for a further publication. We believe that the simultaneous development of 
theoretical models and detailed, rigorous experiments in the vein of poj or |23ll26| |. will help promoting a quantitative 
understanding of collective human (and animal) behaviour. 
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